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The concept of biased data is well known and its practical ap- 
plications range from social sciences and biology to economics and 
quality control. These observations arise when a sampling procedure 
chooses an observation with probability that depends on the value of 
the observation. This is an interesting sampling procedure because it 
t-H , favors some observations and neglects others. It is known that bias- 

^^ ■ ing does not change rates of nonparametric density estimation, but 

no results are available about sharp constants. This article presents 
asymptotic results on sharp minimax density estimation. In particu- 
Cu ' lar, a coefficient of difficulty is introduced that shows the relationship 

between sample sizes of direct and biased samples that imply the 
same accuracy of estimation. The notion of the restricted local mini- 
max, where a low-frequency part of the estimated density is known, is 
introduced; it sheds new light on the phenomenon of nonparametric 
superefficiency. Results of a numerical study are presented. 

(N 

lO I 1. Introduction. Assume that we wish to estimate the probabihty den- 

sity / of a random variable X . If independent direct realizations Xi , X2 , • • • , X„ 

T-j- i of X are available, then optimal solutions of the problem are well known. 

^^ ■ See the discussion in the books by Devroye and Gyorfi (1985), Silverman 

^ ! (1986) and Efromovich (1999). 

In practice it may happen that drawing a direct sample from X is im- 
possible. Instead, an observation X = x may be included with a relative 
chance proportional to a so-called biasing function w{x). Then indepen- 
^ ■ dently recorded biased observations Yi,Y2, . . .,Yn have the density 






X 



(1,1) g(y) = niy)f{g)/nlj). where ;.{/) = £y{u'(X)), 

The distribution of a corresponding random variable Y is called a biased 
distribution and its density g is given in (1.1). Given the biased sample 
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2 S. EFROMOVICH 

Yi,Y2, . . . ,Yn and the biasing function w, the problem is to estimate the 
underlying density / with minimal mean integrated squared error over a 
finite interval of interest. 

In what follows it is always assumed that the interval of interest is the 
unit interval [0, 1], < ci < w{y) < C2 < oo, and w(y) is Riemann integrable 
over the unit interval. 

The following examples illustrate a few general practical settings that 
lead to biased data sets, (a) Let a proportion 1 — w{x) out of the nat- 
ural frequency of X be missing. Then the density of the observed data 
is (1.1). Many specific biological examples can be found in the book by 
Buckland, Anderson, Burnham and Laake (1993). (b) Visibilityhias is a rec- 
ognized problem in aerial survey techniques for estimating, for instance, 
wildlife population density. Interesting particular examples can be found in 
Cook and Martin (1974). (c) A sampling procedure can specifically favor 
larger (or smaller) observations. Two classical examples discussed in Cox 
(1969) are observing interevent times at some random point in time and a 
quality control problem of estimating fiber length distribution, (d) A rather 
general example is a damage model where an observation X may be damaged 
by a destructive process depending on X; hence undamaged observations are 
biased. The interested reader can find more practical examples in the review 
by Patil and Rao (1977). 

Let us also note that in some cases a biased sampling can be a reasonable 
alternative to a direct sampling. As a particular example, consider a study 
sponsored by the National Science Foundation (NSF) and conducted by the 
University of New Mexico on vegetation in the Sevilleta National Wildlife 
Refuge. This refuge lies 65 miles south of Albuquerque in Socorro County 
and includes a desert. Of particular interest is a blue gramma [Bouteloua 
gracilis), which is a native perennial that provides good grazing for wildlife 
and livestock; 1-2 in. tall, it grows in tufts, sod or other types of clusters 
of different shapes. In particular, the study is devoted to monitoring the 
distribution of the number of blades in a cluster, and the monitoring is 
based on biannual manual counts. One of the possibilities for performing 
the counting is to choose some areas and then count blades over these areas. 
This approach is manageable (after all, we are talking about desert), but 
experiments show that then the data are contaminated by large measure- 
ment errors. Recall that measurement errors make the problem of density 
estimation ill-posed and dramatically worsen accuracy of estimation [see 
Efromovich (1999), Chapter 3]. Thus, instead of direct counting, the area is 
sampled by line transects. This makes observations biased because a larger 
cluster has a larger probability of being intersected; however, practically 
negligible measurement errors make the problem dramatically simpler. 

The fundamental result in the theory of biased data is from Cox (1969), 
where the following estimator of the cumulative distribution function was 



DENSITY ESTIMATION FOR BIASED DATA 



suggested: 



(1.2) F(rE)=/in-i^Tx;-i(^0l(>l<2:) 

1=1 

where 

(1.3) //: 



n-'Eti^-HYlY 

For biased data sets, the Cox estimator plays the same role as the empir- 
ical distribution for direct data. An important theoretical property of the 
estimator is that it is a nonparametric maximum likelihood estimator. Thus, 
according to some general results, it is asymptotically efficient in terms of 
dispersion of a corresponding limit process [see the discussion in Gill, Vardi 
and Wellner (1988)]. Cox (1969) also suggested the first consistent kernel 
density estimator motivated by smoothing (1.2). Later many other density 
estimates, including rate optimal ones, were suggested [see the discussion 
in Wu and Mao (1996)]. In particular, it has been established that biasing 
does not affect minimax rates. Interesting results on semiparametric density 
estimation and their applications for moderate sample sizes were obtained 
by Sun and Woodroofe (1997) and Lee and Berger (2001). 

On the other hand, so far no research has been conducted on sharp op- 
timal estimation that, in particular, can shed light on Cox's long-standing 
question about how biasing affects density estimation. Moreover, according 
to Efromovich (1999, 2001), the theory of sharp estimation allows a practi- 
tioner to construct and explain the performance of data-driven estimators 
for small sample sizes. 

This article is organized as follows. The next section presents the main 
theoretical results. These results and their corollaries are discussed in Section 
3. Section 4 provides proofs. 

2. Minimax estimation of differentiable densities. Let us begin by re- 
calling a known classical result for the case of direct observations. Suppose 
that a random variable X is distributed according to a probability density 
f{x), — oo < X < oo, and the problem is to estimate it with a minimal mean 
integrated squared error over the unit interval [0, 1] . It is assumed that / is 
m-times differentiable over [0, 1] and belongs to a corresponding Sobolev set 

{oo oo 'j 

/(x):/(n) = ^e,(^,(n),nG[O,l],^(7rjf™0|<Q . 
j=0 j=i ) 

Here and in what follows, (po{u) = 1, Vj(^) = 2^'^ cos(7rjn), j > 0, m is a pos- 
itive integer number and Q is a positive real number. Also define the corre- 
sponding class of densities 7i{m,Q) = {/(x) : f{x) > 0, — oo < x < oo,/^/(x) dx = 1, f £ 
S{m,Q)}. 
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For this class of densities and the case of n direct observations Xi , X2 , • • • , Xn , 
it is known that 

(2.2) inf sup [If.nf^/^^^'-^'^Eff f\f{x)-f{x)fdx]>l + o{l), 

where the infimum is taken over all possible estimates / based on the data 
set and the parameters m and Q, and 

(2.3) Ifi=Q~^''^'^[TT{m+l)m~^{2m + iy^''^'^]l I f{x)dx. 

Moreover, there exist data-driven estimators that attain the lower bound 
(2.2) [see the discussion in Efromovich (1999), Chapter 7]. Note that atypical 
case considered in the literature is where [0, 1] is the support and thus the 
denominator in (2.3) is equal to 1. 

The approach used is called global minimax because an estimated density 
can be any Sobolev function. On the other hand, in practical applications 
an underlying density is always fixed. To bridge these two settings, Golubev 
(1991) suggested introducing a fixed density /o, not necessarily a Sobolev 
one, and assuming that all possible underlying densities are uniformly close 
to it on the unit interval. 

Namely, let /o be a density on (— oo,C)o) that is continuous and bounded 
below from zero on the interval [0, 1]. No assumption about fo{x) for x be- 
yond the unit interval is made. Introduce a class of densities D{m, Q, /o, p) = 
{/ : IZofix)dx = I, fix) > 0,f{u) = fo{u) + t{u),0 < u < l,t G S{m,Q), 
sup^.grQ ;^i \t{x)\ < p}. Then the problem is to construct a minimax estimate 
for this set. 

Let us present a lower bound for the local minimax approach and the 
case of biased data. In this case, observations Yi, . . . ,Yn of a biased random 
variable Y are given, the density g{y) of Y is defined in (1.1), and we recall 
that assumptions about the given biasing density w are formulated below 
(1.1). Define 

(2.4) Ij^ = Ij,/RCDB. 

Here RCDB is the relative coefficient of difficulty due to biasing: 

f{x)w{x)dx / f{x)w~^{x)dx / f{x)dx. 
-00 Jo / Jo 

Theorem 1. For any p > 0, 

inf sup [//^n]2-/(2-+i)ii; J [\f{x) - f{x)fdx\ > 1 + o(l), 

/ f£D(m,Q,fo,p) ^■'^ J 

(2.6) 



DENSITY ESTIMATION FOR BIASED DATA 5 

where the infimum is taken over all possible estimates f based on the biased 
data set Yi, . . . ,Yn, the density Jq, the biasing function w and parameters 
m, Q and p. 

Note that (2.6) yields a corresponding global lower bound by choosing /q 
that is constant on the unit interval. On the other hand, under the local 
minimax approach, neither the set D is necessarily a subset of the Sobolev 
set nor does /o necessarily belong to the Sobolev set. 

Thus, it is absolutely natural to consider a local minimax setting where 
/o is more dramatically related to a class of possible underlying densities. 
This goal is achieved by the restricted minimax setting where /o belongs 
to the Sobolev set and all possible underlying densities have the same low- 
frequency part as /q. In what follows, we refer to such /q as the anchor 
density, and we need the following property of /q. 

Assumption A (On anchor density). The anchor density /o(x) is known, 
positive and ?7i-fold differentiable on [0, 1], fo{u) = Y^JLq&qj^j{u)-, u G [0, 1], 
and J2]^i{'^J)'^"^(^oj = Q- Also, there exists a sequence ks — > oo, s — > cxd, such 
that J2j>ks3'^"^^0j > Cikj'^^ for some positive Ci and C2. 

Let us comment on the two parts of the assumption made. The first part 
implies that the anchor density is a particular density from the Sobolev 
set (2.1) studied under the global minimax setting. Note that either the 
statistician may know Q and then choose a corresponding anchor density, 
or the statistician may choose an anchor density and then calculate Q. The 
second part (the part about the existence of kg) assumes that the anchor 
density is not too smooth and thus it is a typical density from the Sobolev 
set. For instance, let us check that the assumption about kg holds when- 
ever J2j>oJ'^"^~^'^^oj = 00 for some a > 0. Indeed, if no sequence kg exists 
for a particular C2 = 1 + 2a, then J2j>kf"'^oj < Ck~^-^°', C < 00, for all 
sufficiently large k. The last inequality implies 0q ■ < (7j~2m-i~2a ^^^ thus 
we get the inequality X]i>o.?^"^'''"^07 ^ ^^ that contradicts the assumption 
made. 

The second part of the assumption yields that there always exists an 
increasing to infinity integer- valued sequence Jn such that — 1 < Jn < ln(n) 
and J2j>j„j'^"^(^oj > Ci{Jn + 2)~'-^^ . From now on this sequence is assumed 
to be fixed. Then we introduce the sequence of low- frequency parts of /o, 

(2.7) fojM = T.'^om(^)^ 

j=0 



(2.9) 
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and define the vanishing sequence 

EJn A2ma2 
(7X1 n - I i = l-^ >• 

In what fohows it is assumed that J2j=a'^j ~ whenever b < a. 
Now we are in position to define the restricted density set: 

n{m,QJo,Jn) = \f:f(.u) = foj,Sn)+ ^ 9,ipj{u),ue [0,1], 
/ e 5(m, Q), fix) > 0, /" fix) dx = l]. 

J ~oo J 

It is easy to see that max^g^ ;^] \fi^) — foix)\ = o(l) uniformly over / G 
TLim, Q, /o, Jn)', thus the restricted approach is also local around the anchor 
density /q. Also, we may set J„ = — 1 and then the restricted approach 
becomes global. 

Theorem 2. Let Assumption A hold. Then 

inf sup [//^g-i/2-n]2™/(2-+i)i5i f\fix) - fix))'dx] 

f fe'H{m,Q,fo,Jn) ^JO J 

(2.10) 

>l + o(l), 

where the infimum is taken over all possible estimates f based on the bi- 
ased data set Yi, . . . ,Yn, the anchor density /o, the biasing function w, the 
parameters m, Q and the sequence Jn- 

The lower bounds (2.6) and (2.10) are attained by the Efromovich-Pinsker 
adaptive estimator, which is a blockwise shrinkage estimator defined as fol- 
lows. We divide the set of natural numbers into a sequence of nonoverlapping 
blocks Gk, k = l,2, Then the estimate is 



fc=i 

(2.11) 



dn 



M\Gk\~^ E 0]>il + tk)dn'A E ^j^jix), 



where \Gk\ denotes the cardinality of Gk, !(•) is the indicator, 

n 

(2.12) ej = fLn-^Y.^iO<Yi<l)ipjiYi)w~\Yi) 



1=1 
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is the Cox sample mean estimate of Fourier coefficients and 

n 

(2.13) d = /i2n-^^l(0<yi<l)'UJ-2(y^). 

1=1 

A wide variety of blocks {Gt} and thresholds {tfc} implies sharp minimax- 
ity [see the discussion in Efromovich (1985, 1999, 2000)]. As an example, we 
set |Gfc| =/c2, tfc = l/ln(A; + l) and K= [nV9in(n)J. 

Theorem 3. The Efromovich-Pinsker estimator satisfies 

sup [If^nf^/'~''^+'^Ef\ f\f{u) - f{u)fdu 

(2.14) 

= l + o(l), 

and if Assumption A holds, then 

sup [/y..<?-V2-n]2"^/(2™+^)i? J f\f{u) - f{u)fdu 

fen{m,Q,fo,J„) UO 

(2.15) 

= l + o(l). 

3. Discussion. 

3.1. The minimax approaches. It may be convenient to think about the 
minimax approaches in terms of concepts of game theory. We may think that 
nature (Player I) chooses a density that makes its estimation most difficult 
for the statistician (Player II). Then the main difference between the three 
minimax approaches introduced in Section 2 is in the information available 
to the statistician. Under the global approach, the statistician knows that 
nature chooses a density from a given Sobolev set. Under the local approach, 
the statistician knows that nature chooses a density which is uniformly close 
to a given density. 

Under the restricted approach the statistician knows dramatically more 
about nature's choice. The statistician has the same information as in the 
global game and additionally knows the low-frequency part of nature's choice. 
The latter also makes the game local because an underlying density is uni- 
formly close to its low- frequency part. In other words, the restricted game is 
about estimating the high-frequency part of an underlying Sobolev density. 

A minimax data-driven estimator (i.e., the estimator based only on data 
and the biasing function) should perform not worse than the statistician 
playing the minimax game. Thus the restricted minimax game is more chal- 
lenging for a data-driven estimation. On the other hand, the restricted game 
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is more rewarding because the rate of mean integrated squared error conver- 
gence is always faster than the global or local minimax rate j7,-2m/(2m+i) 

It has long been a tradition in the nonparametric literature to study global 
and/or local minimax approaches. This explains the familiar slogan ". . . if 
we are prepared to assume that the unknown density has m derivatives, then 
. . . the optimal mean integrated squared error is of order 7i-2''n/{2m+i) _ _ _" 
The citation is from Silverman [(1986), page 70]. The results of Section 2 
show that this classical rate is optimal only if data-driven estimates are com- 
pared with the statistician playing global or local minimax games, that is, 
with the less informed statistician. Faster rates can be obtained by matching 
the restricted minimax game. 

Let us make one more remark about the minimax approaches. It is pos- 
sible to change the setting a bit and to assume that the underlying den- 
sity is known beyond the interval of interest. This makes the average value 
/q f{x)dx known, but this fact does not affect the asymptotics. 

3.2. Practical implications of the minimax approaches. At first glance, 
because restricted minimax implies faster asymptotic rates, there is no rea- 
son to study global and local minimax approaches. 

Interestingly, small data sets justify the study of these classical minimax 
approaches. It was shown by Efromovich (1999) that, for small sample sizes 
(up to several hundred), the problem of nonparametric density estimation 
is equivalent to the problem of estimating a low-frequency part of the un- 
derlying density. The reader can check this assertion using the software in 
Efromovich [(1999), Chapter 3]. As a result, the restricted minimax approach 
with nonnegative Jn is not applicable for small sample sizes, because it as- 
sumes that a low-frequency part of the underlying density is given. (This 
is also the reason behind the construction of J„ that allows us to apply no 
restrictions on the underlying density for small n.) 

The situation changes for moderate and large sample sizes like the ones 
studied in the wavelet literature. For these sample sizes, knowing or not 
knowing a low-frequency part of the density has no significant effect on 
the estimation, and thus the restricted minimax is absolutely appropriate. 
Again, the interested reader can use the software to check this assertion. 

We may conclude that each minimax approach has its own practical ap- 
plications. 

3.3. Restricted minimax and nonparametric superefficiency. The phe- 
nomenon of parametric superefficiency is well known. A famous example 
is the Hodges superefficient estimator that, for normal observations, allows 
us to improve a sample mean estimator (efficient estimator) at any given 
point. This is an interesting theoretical phenomenon; on the other hand. 



DENSITY ESTIMATION FOR BIASED DATA 9 

super efficient estimators are typically not used, because the set of super- 
efficiency has Lebesgue measure zero and estimation at other points may 
worsen. See the discussion in Ibragimov and Khasminskii [(1981), Section 
2.13]. 

By contrast, it was shown in Brown, Low and Zhao (1997) that, in non- 
parametric problems, every curve can be a point of superefficiency. Their 
main result, "translated" into our density estimation setting, is that for any 
/ £TC{m,Q) there exists an estimator fn such that 

(3.1) n2-/(2™+i)£;^|^\/„(^) _ f(x)fdx^ = o(l). 

This result implies a better rate than the classical 72-2m/(2m+i) _ 'pj^jg g^^, 
plains why Brown, Low and Zhao (1997) refer to (3.1) as the nonparametric 
superefficiency. 

On the other hand, (3.1) is in agreement with the restricted minimax 
rates. Let us also note that if an underlying density is parametric (it has a 
finite number of nonzero Fourier coefficients), then the Efromovich-Pinsker 
estimator implies the parametric rate n~^ of the mean integrated squared er- 
ror convergence [see Efromovich (1985)]. This indicates the range of possible 
nonparametric rates. 

3.4. Restricted minimax and oracles. An oracle is an estimator that is 
based on both data and the underlying density. The oracle approach means 
a data-driven estimation that mimics the oracle performance [see the discus- 
sion in Efromovich (1999)]. The restricted minimax bridges classical mini- 
max approaches (where underlying densities belong to function spaces) and 
oracle approaches (where the underlying density is given) by assuming that 
the underlying density belongs to a function space and its low-frequency 
part is given. 

3.5. Average risk. One of the long-standing problems in the sharp es- 
timation literature is to find an estimate of the density g{y) of direct ob- 
servations Yi,Y2, . . . ,Yn that minimizes an average risk with the averaging 
function a{y). In other words, this estimate should minimize the average risk 

(3.2) AR= Egl^j\{y)ig{y)-g{y)fdy 

Let us assume that the averaging function satisfies < C* < a(y) < C* < 
oo . Then it is easy to see that if we define 

w{y) = l/Va{y), 

(3.3) fiy) = Kf>~\y)9iy), 

f{y) = Kf>-\y)9{y), 
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then 

(3.4) AR = fi~\f)Efy\f{x)-f{x)fdxy 

Here / and g can be thought of as the underlying and the biased densities. 
Particular examples of using this equivalence for finding sharp asymp- 
totics are presented in Efromovich (2004). 

3.6. Naive estimation. Using a naive estimator fn{x) = gn{x) fiw^^ {x) 
is a popular and intuitively clear idea [see the discussion in Wu and Mao 
(1996) and Wu (1997)]. Here gn is an estimator of the density g of biased 
observations Yi, . . . ,Yn. Section 2 implies that smoothness of the biasing 
function plays a crucial role in the accuracy of the naive estimator. The 
naive estimator is rate inadmissible whenever the biasing function is not as 
smooth as the underlying density /. On the other hand, specific examples 
where naive estimation is sharp minimax can be found in Efromovich (2004). 
We may conclude that because smoothness of / is typically unknown, it is 
better to avoid the use of naive estimation. 

It is also important to note that smoothness of the biasing function does 
not affect optimal estimation. Thus, even if the biased distribution is dis- 
continuous or not differentiable, the quality of sharp minimax estimation of 
the underlying density / is defined only by its own smoothness. The only 
functional of w that affects the estimation is the coefficient of difficulty due 
to biasing, discussed in the next section. 

3.7. Coefficient of difficulty. An interesting theoretical outcome of Sec- 
tion 2 is that, for a biased sample of size n, the same precision of estimation 
is achievable by a direct sample of size n' = n/RCDB , where RCDB is the 
relative coefficient of difficulty due to biasing defined in (2.5). Recall that the 
notion of the coefficient of difficulty was introduced in Efromovich (1999). 

Thought-provoking examples in Cox (1969) indicate that biasing can al- 
ways improve or worsen the estimation. Translated into the nonparametric 
setting considered, this would imply that biasing can always increase or 
decrease the RCDB. 

Let us present an example where RCDB is always greater than 1, that is, 
the example where biasing always worsens the density estimation. According 
to the Cauchy-Schwarz inequality. 



(3.5) 



l{0<x<l)f{x)dx 



< / f{x)w{x)dx / f{x)w ^{x)dx 

D Jo 



with equality iff ■w{x) = cl(0 < x <1), c > 0, almost sure with respect to 
/. Thus, if [0,1] is the support of X, then any biasing yields RCDB > 1, 
that is, biasing always worsens the density estimation. Otherwise, RCDB > 
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/q f{x) dx and, similarly to examples in Cox (1969), biasing can improve or 
worsen the estimation. 

This is a useful conclusion for practitioners because, as we shall see in 
Section 3.9, the asymptotic RCDB can be used for the analysis of small 
data sets. 

3.8. Versatility of the Efromovich-Pinsker estimator. It is well known 
that many functionals of this estimator (including derivatives and integrals) 
are optimal estimators of the corresponding functionals, that is, the estima- 
tor is versatile [see the discussion in Efromovich (1999), Chapter 7]. It is 
possible to establish that the same conclusion holds for the case of biased 
data; and results will be published elsewhere. 

3.9. Adaptive estimation: from asymptotic to small sample sizes. Asymp- 
totic results presented in Section 2 show that the Efromovich-Pinsker series 
estimator is asymptotically minimax. This justifies the use of software de- 
veloped in Efromovich [(1999), Chapter 3] for small data sets. The software, 
which contains both a generator of biased data sets and the adaptive estima- 
tor for small biased data sets, is available over the Worldwide Web [see the 
instructions on how to download and use it in Efromovich (1999), Appendix 
B or e-mail the author]. 

Using this software, let us shed light on the nature of a biased sampling 
and then comment on the possibility of using the coefficient of difficulty 
RCDB for small data sets. 

Figure 1 presents a particular biased data set of size n = 25 shown by 
letters Y. The underlying density / is the Normal density shown by the 
solid line and defined in Efromovich [(1999), page 18]. The sample is biased 
by the biasing function w{y) = 0.1 + 0.9y shown by the long-dashed line, 
that is, the data may be referred to as length biased. The right-skewed data 
set clearly exhibits the effect of this biasing. To exhibit the structure of the 
data set, the short-dashed line shows us its estimated density [the adaptive 
estimate of Efromovich (1999) is used]. This is what the statistician might 
see if the biased nature of the data were ignored or unknown. Note how the 
skewed density of Y's differs from the symmetric Normal density. 

The dotted line shows the suggested adaptive biased-data density esti- 
mate. By taking into account the biasing, the estimate correctly restored 
the symmetric about 0.5 shape of the underlying density. It also removed 
the heavy left tail of the estimated density of Y's created by the three small- 
est length-biased observations. 

This particular simulation together with the discussion in the Section 3.7 
raises the following question. Suppose that n' is a sample size that implies a 
reasonable estimation of an underlying density based on direct observations. 
Then what is the corresponding sample size n for a biased data set that 
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Fig. 1. Analysis of biased data. The solid and long-dashed lines show the underlying 
Normal density f and the biasing function w{y) = 0.1 + 0.9j/, respectively. A simulated 
biased data set of size n — 25 is shown by letters Y. The dotted line shows the estimate of 
f (the estimate based on Y 's and the biasing function). The short-dashed line shows the 
estimate of g, that is, of the density of the biased observations Y. 



implies a similar precision of estimation in terms of mean integrated squared 
error? According to the asymptotic results of Section 2, n' times RCDB 
should be the answer, but can this asymptotic rule be used for small sample 
sizes? 

Let us begin the discussion with a particular simulation and then com- 
plement it with an intensive Monte Carlo study. 

Figure 2 explains the problem explored. The underlying density is the 
monotone one shown by the solid line and defined in Efromovich [(1999), 
page 18]. A direct sample of size n' = 25 from this density is shown by X's. 
Note that the sample correctly represents the underlying density, and this 
also can be seen from the estimate of the density shown by the short-dashed 
line. 

A biased sample of size n = 44 from the same density is shown by Y's. The 
utilized biasing function is w{y) = 1 — 0.95y, and this implies RCDB = 1.74 
and the above-mentioned sample size n = n' x RCDB = 44. If the asymptotic 
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theory holds for these small sample sizes, then the density estimation based 
on 25 direct and 44 biased data values should be similar in terms of mean 
integrated squared error. For the particular samples the estimates for direct 
and biased samples are shown by the short-dashed and dotted lines, respec- 
tively. The short-dashed line better exhibits the underlying density, and it 
may look like the 1.74- fold increase in the sample size is not large enough to 
compensate for the biasing. On the other hand, let us recall that the samples 
are independent and another simulation may change the outcome. 

Let us repeat this particular simulation 500 times, calculate corresponding 
integrated squared errors (ISEs) and then analyze them. The results are 
presented in Figure 3. The top diagram shows by character 1 ISEs for direct 
data sets and by 2 for biased data sets. Let us repeat that all samples are 
independent. Clearly a majority of ISEs are relatively small but there is a 
thin right tail in their distribution. Thus, we show densities of ISEs over two 

Biased and Direct Data 



o 



YVW Y YY YVr Y 



LU 



IT) 

d 



o 
d 




0.0 



0.2 



0.4 



0.6 



0.8 



1.0 



Fig. 2. Analysts of biased and direct data. The solid and long-dashed lines show the 
underlying monotone density f and the biasing function wijj) = 1 — 0.95j/, respectively. 
The corresponding RCDB = 1.74. A simulated direct data set of size n' = 25 from the 
underlying monotone density is shown by letters X. A biased data set of adjusted size 
n = n' X RCDB = 44 is shown by letters Y. The dotted line shows the biased-data density 
estimate (the estimate is based on Y 's and the biasing function) and the short-dashed line 
shows the density estimate of X. 
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subintervals: ISEs that are at most 0.11 and larger ISEs. The value 0.11 is 
the mean (up to the rounded second digit) of both sets of ISEs (here we 
have an ideal outcome in terms of the empirical mean integrated squared 
errors). 

The densities for these two groups of ISEs are shown in the middle and 
bottom diagrams. As we see, the distributions are practically identical, and 
while there is no asymptotic theory to support this outcome, it is an inter- 
esting empirical observation. 

Figure 4 shows an outcome of a similar study only with the underlying 
density and the biasing function utilized in Figure 1. The main parameters 
are presented in the caption, and here let us stress only the relatively small 
RCDB = 1.07. As we see, the outcome is very similar. The sample means 



OBSERVED ISEs 



1 



1 



1 ' 2 

1 ^ {2 1 

-.22 1 






1 1 

2 2 ,12 1 ,2,»11 2 1l2, 



„ V^MH 21=2 1 I 1 



EXPERIMENT 

DENSITIES OF ISEs FROM [0,0.11] 



DENSITIES OF ISEs FROM (0.11,0.8] 



Fig. 3. Results of 500 Monte Carlo simulations identical to the one shown in Figure 2, 
Characters 1 and 2 in the top diagram show ISEs of the estimates based on 25 direct and 
44 biased observations, respectively. The sample means are identical (up to the second digit) 
and equal to 0.11. The solid and dotted lines in the bottom diagrams show the densities 
of ISEs for the direct and biased samples, respectively. The two bottom diagrams show the 
densities for ISEs that are at most the sample mean 0.11 and larger than the sample mean, 
respectively. From the totals of 500, there are 348 and 357 ISEs that are at most 0.11 for 
the direct and biased samples, respectively. 
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are a bit different (tliey are 0.09 and 0.10 for the direct and biased samples, 
resp.), but it is clear that the difference is primarily due to the tails. Repeated 
simulations show that this is indeed the case. 

The numerical study supports the possibility of using RCDB as a measure 
of difficulty due to biasing. The interested reader can find a different numer- 
ical study, which implies a similar outcome for a wider variety of densities 
and biasing functions, in Efromovich (2004). 

4. Proofs. Recall that ^o{^) = 1) ^j{^) = V2cos{ttjx), j > 1, and [xj is 
the rounded down x. 

Proof of Theorem 1. This proof will be also used to verify Theorem 
2, and this explains some steps and comments not directly related to the 
proof. 

OBSERVED ISEs 



' ='i 1^ ■'' ' % ^2i2 






EXPERIMENT 

DENSITIES OF ISEs FROM [0,0.11] 




Fig. 4. A numerical study similar to the one shown in Figure 3, only here the density 
and the biasing function of Figure 1 are used. RCDB — 1.07, and this implies 25 direct 
and 27 biased observations. Mean ISEs are 0.09 and 0.10, respectively. From the totals of 
500, there are 383 and 362 ISEs that are at most 0.11 for the direct and biased samples, 
respectively. 
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We begin by dividing the unit interval into s subintervals where the bi- 
asing function w and the density /o are approximated by simple functions. 
This allows us to obtain relatively simple lower niinimax bounds for each 
subinterval. 

Set s = 1 + [ln(ln(n + 20))J and define 



n. 



/:/(x) = /o(x) + 



s-l 

E 

fc=0 



fkix) 



s-l 

E 

fc=0 



fk{u)du 



l(0<x<l) 



/fc(x)GW,fc,/>0 



Here the function classes TLgk are defined as follows. Let 4>{x) = (/){n,x) 
be a sequence of flattop nonnegative kernels defined on a real line such 
that, for a given n, the kernel is zero beyond (0,1), it is m-fold continu- 



ously differentiable on (— oo, oo), < 4>{x) < 1, 



1 for 2(ln(n))-2 < ^ < 



1 — 2(ln(n))~^ and |i;^('"''| < C(ln(n))^'". For instance, such a kernel may be 
constructed using so-called mollifiers discussed in Efromovich [(1999), Chap- 
ter 7] . Then set (psk i^) = (pisx — k) . For the fcth subinterval, < A; < s — 1 , de- 
fine ipskjix) = ^/Sifijisx - k), /[fc](x) = E/=LJ(fc)/ln(„)J l^skjy^skjix), f(k){x) = 

f[k]{x)cpsk{x), Jik)=_2[[ni2m + l)(m + l)s-^"'Q,,i2mi27Tf^)~_^/(^^+'^\ , 

Qsk = Qil-l/s)iir'lsk)-\ hk = fi~HfoMk/s)/foik/s), and I^' = Efc=o(V^fc) 
Then we define the subclasses 

r Jik) 



nsk = \f--f{x) = fik){x), E (^i) 

t j=lJ{k)/ln{n)\ 



2m,, 2 
^skj 



<S 



-2m 



Qs 



\f[,]ixf<s'Hn)Jik)n~^\. 

Let us verify that, for sufficiently large n, this set of densities is a subset 
of the studied class D. 

First, the definition of the flattop kernel implies that / — /o is m-fold con- 
tinuously differentiable over [0, 1]. Second, let us verify that for / E Tig the 
difference f — fo belongs to S{m, Q). By the Leibniz rule, /q [{f[k^{x)(f>sk{x)Y"^']'^ dx ■ 
So[T.T=o^T f[k\~^\x)^fki^)? dx, where Cf = m\ / {{m - l)\l\) . RecaU that 

maxo</<„Jo^((^2(^))^^^ < C(s(ln(n))2)2™ and, for < / < m, 

j{k) 

E 



I Am-l) 
\j[k\ 



{X)Y 



^skj^\k3 KX) 



(4.1) 



j = LJ{fc)/ln(n)J 



J{k) 

E 

= LJ(fc)/ln(n)J 



f'-^lk, 



J(k) 

E r 

4J(fc)/ln(n)J 



2/ 
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= o(l)(J(fc))-V2, 
where the Cauchy-Schwarz inequahty was used in the middle hne. Also, 

(4.2) f\fff{x)<Psk{x)fdx< /'""'^ \fl;;(\x)fdx<Qsk, 

■^ U " k, I s 

and recall that Ylit^^Qsk = Q(l — s~^). These results imply f — fo^ S{m,Q{l- 
s~^)) for f ^Tis and large n. 
Now denote 



f = fo + f and Ss = J2 f{k){u)du, 

1. — n -^0 



and note that for / G "Hg and any 7 > 0, 
{f{x)-f{x)fdx 

k/s 

(fix) - f{k){x) + Ss) dx 

k/s ^ ' 

>{l-l)l {fix)-f[,^{x)fdx 

Jk/s 
f{k+l)/s 2 

-7 / [f[k]{x){l-4>sk{x)) + 5s] dx 

Jk/s 

> (1 - 7) r^'^'\f{x) - fik]{x)f dx + o(l)7-i(ln(n))-V2^-2-/(2-+i). 

Jk/s 

Set 7 = s^'^ and, using the above-obtained relationship, we get 
sup e\[ {f{x)-f{x)fdx\ 

■eD{m,QJo,p) UO J 

> supe! f\fix)-f{x))^dx] 

supj]^ / {f{x)-f{x)fdx\ 



«-l ^ <-(fe+l)/s 



>(l-s"i)^ sup Y. E{{Dskj-yskj?} + o{l) 



n 



~2m./(2m+l) 



fc=0 /e-H'afe j= L j(fc)/ in{n)J 

(1 - S"l) 5] i?fc + 0(l)n"2'»/(2m+l) ^ 
fc=0 
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where i>skj = it/s f{x)Vskj{x) dx. 

To estimate R^, following the proof of Theorem 1 in Efromovich (1989), 
we make two additional steps. The first one is to verify that if (^skj ^^^ 
independent normal random variables with zero mean and variance (1 — 
^)^skj^ where here 7 = 7n tends to zero as slowly as desired, then a stochastic 
process f*{x), defined as the studied /(x) but with Qkj in place of Vgkj, 
satisfies the relationship 

(4.3) P(r(x)GH(?n,Q)) = 1 + 0(1), 

and if additionally z^^^ ■ < sn~^, then a similarly defined stochastic process 
/^i satisfies 

(4.4) p( sup \fUx)\'^<s^ln{n)J{k)n~A=l + o{l). 

Va;G[0,l] / 

The second step is to compute for / G 7is the classical parametric Fisher 
information 

(4.5) Iskj = Ef,{[dHf{Y)w{Y)/f,{f))/duskj?}. 

Relationship (4.3) follows from (A. 18) in Pinsker (1980). Also, for u'^/^j < 
sn~^, the inequality 

j(k) 

^ sup[uskjfskjix)f^n{sJ{k)) < Cs^n~V(fc)ln(n) 

j = LJ(fc)/ln(n)J ^ 

holds and this together with Theorem 6.2.2 in Kahane (1985) yields (4.4). 

Now we are in a position to calculate the Fisher information (4.5). To 
simplify notation, let us denote Vskj = & and the corresponding density by 
fe. Write 

ain(ge(n)) _ gln(w(n)/e(n)//i(/e)) 

de ~ de 

Here /^(n) = dfe{u)/d9 and ii'{fg) = dn{fe)/d9. This implies that 



ge{u) 



(4.6) 



de 

= w{u)^,~\fe)[^i\fe){re{u)?fe\u) 
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-2f,{fe)fi'{fe)f^iu) + {fi'{fe)ffe{u)] 
= w{u),,~\fe){f'e{u)ffe\u) 

-2is-\fe)wiu)i^'ife)fl)iu) + fi-\fe)w{u)it,\fe))^feiu) 
= Ti{u) + T2{u)+Ts{u). 

Recall that (pksix) is supported on [k/s, (k + l)/s] and we are estimating 
three components of the Fisher information that correspond to the three 
terms on the right-hand side of (4.6). Write 



Ti 



ik+l)/s 



k/s 



^n)^-'(/o)/o-'(n)(l + o(l)) 



rik+l)/s 

^skj {u)4>sk {u) - I ^skj {z)4>sk {z) dz 

Jk/s 



du. 



To estimate Ti we use the the following three relationships. Write 

'•(fc+l)/s fik+l)/s 

[ipskj{x)(t>sk{x)]^dx = l+ / ip1f.j{x){ct)lf,{x) - 1) dx 

k/s Jk/s 

and then, recalling that (pski^) is the special flattop kernel, 

r-(fc+l)/s 



k/s 



^skjix)i(l)sk{x)-'i^)dx 



:o(l)(ln(n))-\ 



Similarly, 

>'[k+l)/s 



(4.7) 



k/s 



Vskj{x)(l)sk{x)dx 



{k+l)/s 



k/s 

o(l)(ln(n))-\ 



iPskj{x)[(pskix) - l]dx 



Also, using the assumptions about the biasing function and the anchor 
density, we obtain that 

n = is-\fo)w{k/s)f^\k/s){l + o{l))=Isk{l + o{l)). 

Using (4.7), the second component T2 of the Fisher information can be 
estimated as 



T2 = -2f,-\fo)fi'ifo) 

nl 



X / fo{u) 
/o 



{k+l)/s 

ipskj{u)(t)sk{u)- / (pskj{z)(l)sk{z)dz 

k/s 



du{l + o{l)) 



oil). 
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To estimate T^ we write 

A*'(/0)= / feiu)w{u) du 
Jo 




0(1). 



{k+l)/s 



(fskj {u)(j)sk {u) - / (fskj {z)(l)sk (z) dz 

Jk/s 



w(u)du 



Tliis yields T3 = o(l). Combining these results, we obtain that 

Iskj = l^~Hfo)Mk/s)/Mk/s)]{l + o{l))=Isk{l + o{l)). 

Now we can straightforwardly follow the proof of Theorem 1 in Efromovich 
(1989). This yields, for fc G {0, 1, . . . , s - 1}, that 

inf Rk > (s^2™Q.fc)l/(2™+l)(„/^,)-2W(2m+l)p(^ ^^(^))^ 

where the infimum is over all possible nonparametric estimates of / consid- 
ered in the theorem, and P = (2m/27r(m + i))2m/{2m+i) (2m + i)i/(2m+i) jg 
the Pinsker constant. Thus, 



miY,Rk>PQ 



l/(2m+l) 



fe=0 



s~V(/o)E/o(^A)/^(^A) 



fc=0 



2m/{2m+l) 



2m/(2m+l) 



= PQl/(2,n+l) r (^^) J2 f^^^^'\Mx)/w{x)) dx 

L fc=o '^/** 

/ /-l \ 2m/(2m+l) 

= PQi/(2-+i) (^n- V(/o) y^ (/o(x)/u;(x)) dxj (1 + o(l)). 

Theorem 1 is proved. D 

Proof of Theorem 2. This proof follows along the hues of the proof 
of Theorem 1. Necessary changes are as follows. First, a new class Tig is 
introduced. 



n. 



(4.8) 



f:f{x) = f,jSx) 



+ 



Lk=0 



'Jn 



k=l' 



i=l 



X e [0,1], fk(^n,k, fix) = fo{x),x^ [0,1], f> , 
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with no change in "H^fc except for using qnQ in place of Q. Note that accord- 
ing to Assumption A the sequence g„ decreases at most logarithmically and 
thus Jn = o{l)J{k). Thus this class is defined correctly. 

Second, we verify that for large n the inclusion TCs C TLim, Q, /o, Jn) holds. 
Denote 

% = / (/W - foJ„iu))(pj{u)du, f G Tis- 
Jo 

Note that 9j = 0, 1 < j < Jn, and thus the inclusion follows from the in- 
equality 

(4.9) E (^■?')'"^^J ^ '^'^Q' /e^- 

To prove (4.9), denote ^{u) = f{u) - fQj„{u). Because V^'^Ho) = ^^''-'(l) = 
for all odd s < m, using integration by parts implies [see Efromovich 
(1999), Section 2.2] 6] = {ttj)-^"'[Jo i;^'^\u)^j{u) du]^, where ^j{u) = ^j{u) 

for m even and (pj{u) = ^/2sm{-KJx) for m odd. This together with the Par- 
seval identity implies, for both odd and even m, that 






j>Jn 

Then following along the lines of (4.1) and (4.2) we get that 

[\^|,'^^\u)fdu<qnQ{l-s-^). 
Jo 

Inequality (4.9) is verified. 

Finally note that in the estimation of Ig^j i we get a new factor 

r{k+l)/s f 4^ \ 

fskj{ij)(t>sk{u) - / <fskj{z)(t)sk{z) 1 + 2_^<fi{z)Lpi{u) dz 

in place of 

l'ik+l)/s -|2 

(pskj {u)(f>sk (u) - / Vskj {z)(f>sk (z) dz 



Ik/s 

To evaluate this new factor, we write, similarly to (4.7), 

i-{k+l)/s ( ^ 

r(A:+l)/s / ^ 



/ ipskj{z)(l)sk{z)\ l + 2_^ipi{z)(pi{u) dz 

Jk/s \ .^-^ J 

Ak+i)/s ( ^ \ 

o{l)/\n{n)+ I ipskj{z)\ l + 2_,'fi{z)ifi{u) dz 

Jk/s \ .^-^ / 

^ Ak+l)/s 

o{l)/\n{n) + ^ipi{u) I ipskj{z)Vi{z)dz. 

-•1 Jk/s 



< Cj-^i^. 
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Relationship (2.2.7) in Efromovich (1999) implies that 

r{k+l)/s 

/ ifskj{z)ipi{z)dz 

Ik/s 

This inequality allows us to conclude that Igk-j = IskO- + '^(1))' ^^^ then we 
finish the proof following along the lines of the proof of Theorem 1 . D 

Proof of Theorem 3. Here we are verifying the more complicated 
assertion (2.15). The assertion (2.14) is verified similarly and its proof is 
skipped. 

The method of establishing sharp optimality of the Efromovich-Pinsker 
estimator is well developed and it consists of several steps. First of all, sharp 
optimality of a pseudoestimate is established. This is the step that should 
be verified for each particular problem. Then this estimate is mimicked by a 
so-called linear oracle that always performs better than the estimate. This 
step is easily verified. The third step is to show that the Efromovich-Pinsker 
blockwise oracle sharply mimics the linear oracle. For the particular blocks 
and thresholds considered in Section 2, this step is verified in Efromovich 
(1985) and it is well known. Finally, it should be shown that the Efromovich- 
Pinsker estimator mimics the Efromovich-Pinsker oracle. The validity of this 
step follows from Efromovich (1985, 2000). 

Thus in what follows we verify steps 1 and 2. Consider a pseudoestimate 

(4.10) /„(n) = /o,j„(n)+ Y. [i-{j/rr]emi^)^ 

j = Jn + l 

where 6j is defined in (2.12) and J* is the rounded up [nd~^{f,w){2m + 1) x 
(m + l)g„Q(2m(27r)2™)-i]V(2'"+^), d{f,w)=fi{f)J^f{u)w-\u)du. Then, 
for f en{m,Q,fo,Jn), 

Ej^j\Uu)-f{u)fdu 

= Y. Ef{[{i-{j/rr)e,-e,f}+Y(^l 

j = Jn + l j>J* 

(4.11) = Y [{i-U/JTfEf{{9j-ejf} 

j = Jn + l 

*\2m/i2i 



2(1 - {j/rr)iJ/J*rOjiE{e,} - e,) + {j/ry^e 



+ E ^1- 

3>J* 
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Denote Bj = n-^nY.?=i 1(0 <Yi< l)w~'^{Yi)(pj{Yi), that is, Oj is the Cox 
empirical estimate 9j with fi replaced by ^. Then using the elementary 
identity 

we get [in what follows n = iJ,{f)] 

(4.12) = e, + fi-'EfiifL - f,)e, + {fi- ^)(% - e,)} 

= e, - fi^EjiJlifi-' - f,-^} + fi-'Efiift - i,)i9, - 9,)}. 
Also 

(4.13) = EfHe, - %)2} + 2f,~'Ef{i9, - 9,){f, - i,)e,} 

+ ^,~^E{if,-t,f9]}. 

Using trigonometric relationships (3.1.7) and (3.1.8) inEfromovich (1999), 
we get 

Ef{i9,-9,)'} 
(4.14) 



< n 



-1 



d{f,w)+n / f{u)w-^{u)2-^/'^ip2j{u)du 



(4.15) 



and 



Ef{i9,-9,){fi-fi)9,} 

= 9jEf{{9, - 9,)if, - /.)} + Ef{{9j - 9,f{fL - /x)} 



(4.16) Efiif, - f,f9]} = 9]Ef{{il - ^,f} + Efiif, - fifi9] - 9])}. 
Then using the Cauchy-Schwarz inequality, 

Efiifi - /i)n < Cn-2, Ef{{9, - 9,)^} < Cn-\ 

we get 

(4.17) Ef{{9j - 9jf] < n~^d{f,w) + n~^Kj+cn-^/'^, 

where J2T=i '^i < '^■ 

Combining all these results in (4.11) we get 

Ef[j\fn{n)-f{u)y 
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(4.18) < J2 {l-{j/rrfn~'d{f,w){l + o{l)) 

j=Jn+l 

+ cn-' J2 \Oj\+T.i3/rf"'o]. 

j = Jn + l j>Jn 

Note that E/Ij„ 1^,1 = 0(1) and (vrJ*)-^™ J2.^jjnjfrag2 < (^j*)-2™,^^g 
whenever / E 7Y(?7i, Q, /o, Jn)- Finally, plugging in J* and elementary calcu- 
lations imply 

(4.19) Efy\un) - f{u)f^ < [//.(/-i/2-n]-2-/(2™+i)(i + „(!)). 

The first step in the proof is done. Then similarly to Section 7.4.5 in 
Efromovich (1999), we establish that the linear oracle 

ni/3 

(4.20) /i„(n) = /oj„(«)+ J2 e^iO^ + dn-Y^ejip.iu) 

j = Jn + l 

dominates the pseudoestimate (4.10). 

Finally note that the elementary relationship 



1 7=0 J 



holds. This allows us to follow along the lines of Efromovich (1985, 2000) 
and to verify the above-described last two steps in the proof. D 

Acknowledgment. Comments of a referee are appreciated. 
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